Feature(MInference): add triton-based decoding in case flash_attn is not available #35

liyucheng09 · 2024-07-13T16:03:22Z

What does this PR do?

Feature

Add triton-based decoding for HF mode, in case flash_attn is not available.
The vLLM mode stay the same as it wouldn't require flash_attn during decoding.

Bug Fixed

Fixes [Question]: Is A6000 supported? #23 [Question]: "Building wheel for flash_attn (setup.py) " for a long time without any notification #37

UnitTest

Passed in Local

Who can review?

@iofu728

add triton-based decoding in case flash_attn is not available

aa0c5e9

iofu728 assigned liyucheng09 and iofu728 Jul 15, 2024

iofu728 added the feature feature label Jul 15, 2024

Feature(MInference): remove flash_attn dependency

2525744

iofu728 changed the title ~~add triton-based decoding in case flash_attn is not available~~ Feature(MInference): add triton-based decoding in case flash_attn is not available Jul 15, 2024

iofu728 approved these changes Jul 15, 2024

View reviewed changes

iofu728 merged commit 50d17d9 into main Jul 15, 2024
1 check passed

iofu728 deleted the decoding-dev branch July 15, 2024 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature(MInference): add triton-based decoding in case flash_attn is not available #35

Feature(MInference): add triton-based decoding in case flash_attn is not available #35

liyucheng09 commented Jul 13, 2024 •

edited by iofu728

Loading

Feature(MInference): add triton-based decoding in case flash_attn is not available #35

Feature(MInference): add triton-based decoding in case flash_attn is not available #35

Conversation

liyucheng09 commented Jul 13, 2024 • edited by iofu728 Loading

What does this PR do?

Feature

Bug Fixed

UnitTest

Who can review?

liyucheng09 commented Jul 13, 2024 •

edited by iofu728

Loading